Search CORE

5 research outputs found

Integration of a Digital Built-in Self-Test for On-Chip Memories

Author: Luo Xiao
Nouripayam Masoud
Publication venue: Lunds universitet/Institutionen för elektro- och informationsteknik
Publication date: 01/01/2017
Field of study

The ability of testing on-chip circuitry is extremely essential to ASIC implemen- tations today. However, providing functional tests and verification for on-chip (embedded) memories always poses a huge number of challenges to the designer. Therefore, a co-existing automated built-in self-test block with the Design Under Test (DUT) seems crucial to provide comprehensive, efficient and robust testing features. The target DUT of this thesis project is the state-of-the-arts Ultra Low Power (ULP) dual-port SRAMs designed in ASIC group of EIT department at Lund University. This thesis starts from system RTL modeling and verification from an earlier project, and then goes through ASIC design phase in 28 nm FD-SOI technology from ST-Microelectronics. All scripts during the ASIC design phase are developed in TCL. This design is implemented with multiple power domains (using CPF approach and introducing level-shifters at crossing-points between domains) and multiple clock sources in order to make it possible to perform various measurements with a high reliability on different flavours of a dual-port SRAM.This design is able to reduce dramatically the complexity of verification and measurement to integrated memories. This digital integrated circuit (IC) is developed as an application-specific IC (ASIC) chip for functional verification of integrated memories and measuring them in different aspects such as power consumption. The design is automated and capable of being reconfigured easily in terms of required actions and data for testing on-chip memories. Put it in other words, this design has automated and optimized the generation of what data to be stored on which location on memories as well as how they have been treated and interpreted later on. For instance, it refreshes and delivers different operation modes and working patterns to the entire test system in order to fully utilize integrated memories, of which such an automation is instructed by the stimuli to the chip. Besides, the pattern generation of the stimuli is implemented on MATLAB in an automated way. Due to constant advancements in chip manufacturing technology, more devices are squeezed into the same silicon area. Meaning that in order to monitor more internal signals introduced by the increased complexity of the circuits, more dedicated input/output ports (the physical interface between the chip internal signals and outside world) are required, that makes the chip bonding and testing in the future difficult and time-consuming. Additionally, memories usually have a bigger number of pins for signal reactions than other circuit blocks do, the method of dealing with so many pins should also be taken into account. Thus, a few techniques are adopted in this system to assist the designers deal with all mentioned issues. Once the ASIC chip has been fabricated (manufactured) and bonded, the on-chip memories can be tested directly on a printed circuit board in a simple and flexible way: Once test instruction input is loaded into the chip, the system starts to update the system settings and then to generate the internal configurations(parameters) so that all different operations, modes or instructions related to memory testing are automatically processed

HEVC (H.265) Intra-Frame prediction implementation Using MATLAB

Author: Nouripayam Masoud
Sheikhipoor Nima
Publication venue: Blekinge Tekniska Högskola, Institutionen för tillämpad signalbehandling
Publication date: 01/01/2014
Field of study

HEVC (H.265) standard is the latest enhanced video coding standard which was planned to improve the rendered specifications of its preceding standard MPEG-4 (H.264). According to the H.265 “The main goal of the HEVC standardization effort is to enable significantly improved compression performance relative to existing standards—in the range of 50% bit-rate reduction for equal perceptual video quality” [2]. Intra-picture prediction is a tool in HEVC which “uses some prediction of data spatially from region-to-region within a specific picture, but has no dependence on other pictures in the video frames” [2]. Intra-picture prediction of HEVC is the legacy of intra-frame prediction tool in H.264. Although both of them has the same approach in for spatial prediction of pictures based on spatial sample prediction followed by transform coding, H.265 intra-frame prediction uses much more developed features compared to H.264. An overview of the main features in intra-frame prediction of H.265 could be written as follows: A quad-tree block division structure with respect to amount of details in an image 33 Angular modes in angular prediction (just 8 different modes in H.264) Planar prediction for smoothing the sample surfaces [2] It’s worth mentioning that the quad-tree structure of H.265 intra normally uses square block with sizes in range 4, 8, 16, 32 and 64 (different block sizes based-on the level of granularity in the image), while in H.264 the processing units are up to macro-blocks of 16x16 samples. Moreover, while this video coding standard splits images to one luma and two chroma parts, thesis focuses only on the implementation of intra-prediction on luma part of an image. This thesis aims at implementation of the intra-frame prediction of HEVC using MATLAB. All the steps of implementation process are listed as follows: Converting RGB images to YUV colour-space and working on the luma part (or Y) Splitting images to square blocks ranging from 4 to 64 pixels Implementing intra-frame prediction algorithm Comparing intra-prediction output of H.264 and H.265 in square blocks with size 4 and 16 pixels This Thesis is organised in 3 main sections. The first and second sections revolve around literature review and definition of the concept of HEVC standard and intra-prediction respectively. The third section focuses on the implementation process and evaluation of the prediction algorithm. Finally, in the evaluation part, based-on statistical graphs derived from the output comparison of H.264 and H.265 intra-prediction for different images, it has been demonstrated that H.265 by far has a better image quality than of the H.264

Blekinge Institute of Technology

Digitala Vetenskapliga Arkivet - Academic Archive On-line

An Energy-Efficient Near-Memory Computing Architecture for CNN Inference at Cache Level

Author: Kishorelal Vignajeth Kuttuva
Nouripayam Masoud
Prieto Arturo
Rodrigues Joachim
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2021
Field of study

A non-von Neumann Near-Memory Computing architecture, optimized for CNN inference in edge computing, is integrated in the cache memory sub-system of a microcontroller unit. The NMC co-processor is evaluated using an 8-bit fixed-point quantized CNN model, and achieves an accuracy of 98% on the MNIST dataset. A full inference of the CNN model executed on the NMC processor, demonstrates an improvement of more than 34× in performance, and 28× in energy-efficiency, compared to the baseline scenario of a conventional single-core processor. The design achieves a performance of 1.39 GOPS (at 200 MHz) and an energy-efficiency of 49 GOPS/W, with negligible area overhead of less than 1%

Lund University Publications

An Area Efficient Single-Cycle xVDD Sub-Vth on-Chip Boost Scheme in 28 nm FD-SOI

Author: Andersson Oskar
Luo Xiao
Mohammadi Babak
Nouripayam Masoud
Rodrigues Joachim
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

An on-chip, low power, and area efficient charge-pump (CP) that generates a multiple of the supply voltage (VDD) in a single clock cycle is presented. The proposed CP utilizes parallel cross-connected CP units, which are implemented using MIM (metal-insulator-metal) capacitors. In the target application, i.e., a sub-threshold SRAM, the capacitors are accommodated on top of the memory banks to remove their area cost, which dominates in a CP realization. In this work, 66 instances of the proposed CP are fully integrated on-chip to assist read and write operations. The design is manufactured in a commercial 28nm FD-SOI technology and different design parameters were verified by measurements. The results verify an increased system-wise performance and power efficiency at a low area overhead of 3.7%. A performance of 37.5MHz for a boost ratio of 2×, and an average energy dissipation of 41 fJ per operation, was observed at 0.36V

Lund University Publications

A Low-Voltage 6T Dual-Port Configured SRAM with Wordline Boost in 28 nm FD-SOI

Author: Johansson Tom
Luo Xiao
Mohammadi Babak
Nouripayam Masoud
Rodrigues Joachim
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/09/2021
Field of study

A 32 Kb dual-port low-voltage SRAM in 28 nm FD-SOI, featuring foundry supplied high-density 6T bitcells, is presented. Dual-port configurability is realized by a unique dual-rail architecture, utilizing boost techniques that guarantee reliable operation in low-voltage. The area cost of the array is 62% lower, compared to widely used 8T two-port or dual-port SRAM arrays. The SRAM reliably operates in the low-voltage regime, and an access rate of 1MHz is measured at VMIN of 0.29 V. The highest energy efficiency of 1.35 fJ/bit-access is obtained at 80 MHz access rate, at a VDD of 0.54 V

Lund University Publications